-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix regex to catch HTML tags #398
Conversation
CI was failing on an un-related test:
It is due to this commit on I fixed the expected result in ec14a49. |
Thank you for fixing this! |
not to forget this dev change: 8c6163b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reverts commit 8c6163b.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm !
This PR fixes the CI for https://github.com/huggingface/course and in particular ./es/chapter5/5.mdx.
Related to this slack thread (internal) and this failing CI.
The failing doc had this form:
in which the
_re_lt_html
regex detected<n<100K \n<Tip>
as a single HTML tag. The_re_lt_html
is meant to detect the<
characters that are not part of a HTML tag.I fixed the regex for this use case + added a regression test for it. All previous unit tests are still passing so hopefully it doesn't break something in the wild. I also took the liberty to add verbose mode to explain a bit more what the regex is doing (too me a bit of time to remember 🙄).
cc @mishig25 @MKhalusova @xenova
(also related to #373 and #394 which introduced and modified this regex)